Skip to content

Move dead-silo message break to connection maintainer#10090

Closed
ReubenBond wants to merge 2 commits into
dotnet:mainfrom
ReubenBond:split/pr10085-05-dead-silo-message-break
Closed

Move dead-silo message break to connection maintainer#10090
ReubenBond wants to merge 2 commits into
dotnet:mainfrom
ReubenBond:split/pr10085-05-dead-silo-message-break

Conversation

@ReubenBond

@ReubenBond ReubenBond commented May 12, 2026

Copy link
Copy Markdown
Member

Part 5 of 7 split from #10085.

Problem:
Breaking outstanding messages to a dead silo was performed from LocalGrainDirectory/Catalog-related cleanup, even though it is unrelated to grain-directory state.

Solution:
Move BreakOutstandingMessagesToSilo handling to SiloConnectionMaintainer when a remote silo is marked Dead, keeping LocalGrainDirectory focused on directory reconciliation.

Stack:
Merge after #10089. This branch is stacked on split/pr10085-04-processing-cleanup; until earlier PRs merge, GitHub may show earlier stack changes. Incremental compare: ReubenBond/orleans@split/pr10085-04-processing-cleanup...split/pr10085-05-dead-silo-message-break

Review focus:
Dead-silo message break placement and LocalGrainDirectory/Catalog separation of concerns.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the directory/membership cleanup split by relocating “break outstanding messages to dead silo” into the networking layer (SiloConnectionMaintainer) and further refactoring LocalGrainDirectory to reconcile membership from IClusterMembershipService snapshots instead of status-oracle event plumbing.

Changes:

  • Move BreakOutstandingMessagesToSilo invocation to SiloConnectionMaintainer when a remote silo is marked Dead.
  • Refactor LocalGrainDirectory to process membership snapshots/updates, adjust local directory/cache via IsDefunctActivation, and take over terminating-silo activation deactivation logic previously in Catalog.
  • Update handoff/retry behavior and add targeted unit tests for defunct-activation determination.
Show a summary per file
File Description
test/Orleans.Runtime.Internal.Tests/LocalGrainDirectoryTests.cs Adds unit coverage for LocalGrainDirectory.IsDefunctActivation behavior across silo statuses/unknowns/successors.
test/Orleans.Core.Tests/Directory/CachedGrainLocatorTests.cs Updates test wiring for LocalGrainDirectory’s new IClusterMembershipService dependency.
src/Orleans.Runtime/Networking/SiloConnectionMaintainer.cs Breaks outstanding messages and closes connections when a remote silo is marked Dead.
src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs Moves to snapshot-driven membership processing; centralizes terminating-silo activation deactivation and defunct-entry cleanup.
src/Orleans.Runtime/GrainDirectory/ILocalGrainDirectory.cs Removes the IsSiloInCluster method from the internal interface surface.
src/Orleans.Runtime/GrainDirectory/GrainDirectoryHandoffManager.cs Adjusts successor checks, filters terminating addresses during handoff acceptance, and changes pending-operation retry/dequeue behavior.
src/Orleans.Runtime/Catalog/Catalog.cs Removes catalog’s silo-status-change directory cleanup/break-outstanding-message responsibilities.

Copilot's findings

Comments suppressed due to low confidence (1)

src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs:677

  • Same as RegisterAsync: hopCount > 1 skips the retry delay/re-check on the first forwarded hop (hopCount==1), despite the comment saying this should happen after the first forward and other operations using hopCount > 0. This can lead to rapid multi-hop forwarding when ownership is in flux. Consider using hopCount > 0 here (or align all forwarding methods to the same semantics).
            // After the first forward, we insert a retry delay and recheck owner before forwarding again
            if (hopCount > 1 && forwardAddress != null)
            {
                await Task.Delay(RETRY_DELAY);
                forwardAddress = this.CheckIfShouldForward(address.GrainId, hopCount, "UnregisterAsync");
  • Files reviewed: 7/7 changed files
  • Comments generated: 2

Comment thread src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs
Comment thread src/Orleans.Runtime/GrainDirectory/GrainDirectoryHandoffManager.cs
@ReubenBond ReubenBond force-pushed the split/pr10085-05-dead-silo-message-break branch 5 times, most recently from f611705 to 3f786dc Compare May 12, 2026 20:10
@ReubenBond ReubenBond enabled auto-merge May 12, 2026 20:11
ReubenBond and others added 2 commits May 12, 2026 13:15
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When background Orleans work logs after xUnit has cleared the current test context, ITestOutputHelper throws InvalidOperationException with "There is no currently active test." That exception can escape through Microsoft.Extensions.Logging and abort the test host.

Fall back to stderr for that specific late-log case so the original runtime log is still emitted without crashing the test process.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond force-pushed the split/pr10085-05-dead-silo-message-break branch from 3f786dc to 6f907c3 Compare May 12, 2026 20:26
@ReubenBond ReubenBond added this pull request to the merge queue May 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 12, 2026
@ReubenBond

Copy link
Copy Markdown
Member Author

Closing as obsolete: the later stacked PR #10092 has already merged and its merge commit includes the #10090/#10091 changes, including the xUnit logger crash fix. Current main also includes the follow-up directory retry PRs #10094 and #10095.

@ReubenBond ReubenBond closed this May 12, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants